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ABSTRACT 


The  general  two  population  discrimination  problem  is 
discussed  briefly  under  various  situations  „  Discrimination 
procedures  using  the  linear  discriminant  function  and  a 
nonparametric  procedure  due  to  Je  L.  Hodges  and  E.  Fix  which 
classifies  a  random  variable  to  a  population  on  the  basis  of 
assigning  it  to  the  population  which  has  the  nearest  obser¬ 
vation  to  an  observed  value  of  the  random  variable  are 
discussed  and  compared  by  computing  the  probabilities  of 
misclassification  for  both  procedures  when  the  two  popu¬ 
lations  are  normal  with  equal  covariance  matrices.  Proba¬ 
bilities  of  misclassification  are  computed  for  the 
nonparametric  discriminator  and  the  linear  discriminant 
function  for  two  small  sample  sizes  for  the  case  when  the 
two  populations  being  discriminated  are  exponential.  In 
this  latter  case,  both  discrimination  procedures  are  shown 
to  give  high  probabilities  of  misclassification  for  certain 
values  of  the  parameters  of  the  distribution  being  discrimi¬ 
nated.  Regions  are  given  in  terms  of  the  parameters  of  the 
two  exponential  distributions  where  one  of  the  probabilities 
of  error  is  greater  than  0.5.  A  more  complete  investigation 
for  larger  sample  sizes  is  recommended  for  the  linear  dis¬ 
criminant  function  and  the  nonparametric  procedure  dis¬ 
cussed  in  this  paper  for  the  case  when  the  two  populations 
being  discriminated  are  exponential. 
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SECTION  I 


INTRODUCTION 


The  two  population  discrimination  problem  may  be  summa¬ 
rized  as  follows :  given  a  random  variable  Z  distributed 
over  some  p-dimensional  space  according  to  a  distribution  F, 
or  according  to  a  distribution  G,  determine  on  the  basis  of 
an  observation,  say  z  of  Z,  which  of  the  two  distributions 
Z  has. 


When  P  and  G  are  completely  known,  the  solution  to  the 
problem  is  implicit  in  the  Neyman-Pearson  lemma.  (1)  The 
discrimination  depends  on  the  ratio  f (z)  where  f  and  g  are 

g'UT 

the  respective  density  functions  of  P  and  G.  The  rule  is 
as  follows: 


If 

If 

If 


>  G,  decide  in  favor  of  F 
<  C,  decide  in  favor  of  G 
=  G,  the  decision  is  arbitrary 


o 


C  is  an  appropriate  positive  constant  chosen  on  the 
basis  of  consideration  relating  to  the  importance  of  the  two 
possible  errors: 


(i)  P1  *  P  (Z  is  assigned  to  G  |  Z  came  from  P) 
(ii)  -  P  (Z  is  assigned  to  F  |  Z  came  from  G). 
The  two  most  widely  advocated  choices  of  C  ares 

(a)  Take  G  =  1 

(b)  Choose  C  such  that  P^  -  P^  . 
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This  procedure,  known  as  the  "likelihood  ratio  pro¬ 
cedure"  is  known  to  have  optimum  properties  with  legard  to 
control  of  the  probability  of  misclassif ication. 

When  P  and  G  are  known  except  for  the  values  of  one  or 
more  parameters,  the  procedure  used  is  much  the  same  as  that 
Just  described.  Under  the  assumption  that  P  and  G  are  known 
except  for  one  or  more  parameters  and  if  we  can  assume  that 
samples  are  available  say: 

JL,X_,X  , . . .  ,X  from  P 
JL2  3  m 

Y.  Y  ,  Y  , , , ,,  Y  from  G 
12  3  n 

1 

we  are  able  to  estimate  the  unknown  parameters,  denoted  col¬ 
lectively  by  0,  By  some  estimation  procedure,  we  can  esti¬ 
mate  0  by  6  and  assume  that  F^and  Gg  are  the  correct 
distribution  functions.  The  "likelihood  ratio  procedure" 
and  the  decision  xrules  outlined  above  can  now  be  applied. 

If  it  is  assumed  that  P  and  G  are  p-variate  normal 
distributions  having  the  same  (unknown)  covariance  matrix 
and  unknown  expectation  vectors,  the  linear  discriminant 
function  is  a  good  example  of  this  procedure.  (2)  The  given 
samples  are  used  to  estimate  the  covariance  matrices  and  the 
expectation  vectors  and  the  "likelihood  ratio  procedure"  is 
used  under  the  assumption  that  the  estimated  parameters  are 
known  to  be  correct.  It  is  known  that  under  the  normal  as¬ 
sumption  for  P  and  G  and  the  homoscedastic  assumption  that 
the  linear  discriminant  function  is  an  optimal  procedure. 
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Although  this  procedure  seems  reasonable  when  the 
parametric  form  of  the  distributions  is  correct  or  the  as¬ 
sumed  form  is  correct,  there  is  concern  about  the  validity 
of  this  procedure  when  the  linear  discriminant  function  is 
used  with  data  not  normal,  or  if  normal,  with  unequal 
covariance  matrices.  In  fact  in  the  normal  situation  when 
the  covariance  matrices  are  not  equal,  a  quadratic  function 
can  be  shown  to  be  optimal.  There  is  a  need  then  for  a 
reasonable  discrimination  procedure  whose  validity  does  not 
require  the  knowledge  implied  by  the  normality  assumption, 
the  homoscedastic  assumption  or  any  assumption  about  the 
parametric  form. 

Several  classes  of  nonparametric  discrimination  pro¬ 
cedures  were  proposed  in  (3).  These  procedures  were  proven 
to  have  asymptotic  optimum  properties  for  large  samples, 
m  (4),  some  of  these  nonparametric  procedures  were  investi¬ 
gated  when  the  samples  were  small.  These  procedures  were 
compared  with  the  linear  discriminant  function  where  P  and 
G  were  assumed  normal  with  equal  covariance  matrices  since 
under  these  assumptions  the  linear  discriminant  function  is 
known  to  be  optimal.  A  comparison  was  made  by  comparing  the 
probabilities  of  misclassification  when  the  linear  discriminant 
function  was  used  against  the  probabilities  of  misclassifi¬ 
cation  when  the  nonparametric  procedures  were  used.  A 
survey  of  the  procedures  and  results  of  (i^)  are  given  in 


Section  II  of  this  paper. 

In  Section  III  of  this  paper*  an  investigation  is  made 
of  the  performance  of  one  of  the  nonparaxnetric  discriminators 
discussed  in  (  I4)  and  of  the  performance  of  the  linear 
discriminant  function  when  P  and  G  are  not  normal  but*  in 
fact*  exponential  with  parameters  X  811(1  U  respectively. 

The  exponential  distribution  was  selected  because  of  the  role 
it  plays  in  the  field  of  life  testing*  and  other  applied 
problems.  It  is  shown  that  for  sample  sizes  of  1  and  2* 
that  both  the  nonparametric  discriminator  and  the  linear 
discriminant  function  give  very  poor  results  for  certain 
values  of  X  811(1  U  0 

Detailed  conclusions  and  recommendations  made  on  the 
basis  of  the  results  attained  in  Sections  II  and  III  are 
contained  in  Section  IV  of  this  paper. 

Professors  R.  R.  Read  and  J.  R.  Borsting*  of  the  U.  S. 
Naval  Postgraduate  School,  have  generously  given  their  time 
to  provide  direction,  encouragement  and  valuable  advice  to 
the  author  in  the  writing  of  this  paper. 


SECTION  II 


PERFORMANCE  OF  THE  LINEAR  DISCRIMINANT  FUNCTION 
AND  A  CLASS  OF  NONPARAMETRIC  DISCRIMINATORS 
WHEN  THE  TWO  POPULATIONS  BEING  DISCRIMINATED 
HAVE  NORMAL  DISTRIBUTIONS  WITH 
EQUAL  COVARIANCE  MATRICES 

Let  X  „  X  ,o..»X  be  a  sample  from  a  p -variate  distri- 
12  in 

bution  F  and  let  Y, ,  Y_,  ...,Y  be  a  sample  from  a  p-variate 

12  n 

distribution  G0  It  is  assumed  further  that  the  parametric 
forms  of  F  and  G  are  unknown  0  If  z  is  an  observation  of  a 
random  variable  Z  known  to  be  either  distributed  as  F  or  G, 
how  is  it  decided  on  the  basis  of  z  to  which  population  Z 
belongs?  Define  a  distance  function  (in  p- dimensional  space) 
which  will  permit  a  ranking  of  the  m+n  observations  ac¬ 
cording  to  their  “nearness”  to  z.  The  idea  of  the  discrimi¬ 
nation  procedures  outlined  in  (3)  is  to  assign  Z  to  the 
population  which  has  the  most  observations  nearest  to  z. 
Specifically,  choose  an  odd  integer,  k,  and  assume  for  sim¬ 
plicity  that  m=n,  then  Z  is  assigned  to  the  distribution 
from  which  came  the  majority  of  the  k  nearest  observations „ 

In  (3)»  it  was  shown  that  several  classes  of  these  non- 
parametric  discriminators  have  asymptotically  optimum  per¬ 
formance  as  m-*00  and  n-^00  at  the  same  rate„  By  optimum 
performance,  it  is  meant  that  the  probabilities  of  misclassifi- 
cation  P^  and  P^,  as  defined  in  the  introduction,  tend  to 
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the  theoretical  minimum  values  which  they  could  have  if  F 
and  G  were  completely  known. 

The  asymptotic  properties  and  the  simplicity  of  ap¬ 
plying  the  procedures  of  this  class  of  nonparametric  dis¬ 
criminators  suggest  that  this  type  of  procedures  might  be  a 
reasonable  alternative  to  the  commonly  applied  linear  dis¬ 
criminant  function.  However,  to  propose  an  alternative  to 
the  the  linear  discriminant  function  solely  on  the  basis  of 
asymptotic  properties  and  ease  of  application  would  not  be 
entirely  reasonable.  In  particular,  the  small  sample  per¬ 
formance  of  such  nonparametric  discriminators  needs  investi¬ 
gation  to  ascertain  how  much  discrimination  power  is  lost 
when  F  and  G  are  known  to  be  normal  with  equal  covariance 
matrioes  so  that  the  linear  discriminant  function  Is  ap¬ 
propriate*  One  way  this  investigation  can  be  accomplished 
Is  by  comparing  the  probabilities  of  misclassification  when 
the  linear  discriminant  function  is  used  with  the  corre¬ 
sponding  probabilities  of  misclassification  when  the  non¬ 
parametric  discriminators  are  used.  Such  an  investigation 
was  made  in  (ij.).  The  remainder  of  Section  II  is  devoted  to 
summarizing  the  procedures  and  results  of  (Ij.). 

It  is  first  pointed  out  that  the  problem  can  be  reduced 
considerably  by  considering  linear  transformations  in  the 
observation  space.  It  is  always  possible  by  such  transfor¬ 
mations  to  insure  that  F  and  G  will  have  the  identity 
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covariance  matrix.  In  other  words*  in  the  new  space*  the  p 
transformed  measurements  are  independent  in  each  population 
and  each  measurement  has  a  unit  variance „  It  is  also  possi¬ 
ble  by  such  transformations  to  put  the  expectation  vector  of 
the  P  population  at  the  origin  and  the  expectation  vector  of 
the  G  population  on  the  positive  first  axis*  This  allows 
complete  specification  of  the  transformed  population  by  the 
two  parameters  p  and  A  where 

A  =  E  (first  coordinate  of  Y) 

=  distance  between  the  means  of  the 
transformed  populations. 

In  performing  such  linear  transformations,  end  for  the 
linear  discriminant  function  are  unchanged.  The  proba¬ 
bilities  P^  and  P^  for  the  nonparametrlc  discriminators  are 
likewise  unchanged  since  such  linear  transformations  map 
the  totality  of  distance  functions  one-one  into  the  totality 
in  the  new  space. 

It  is  assumed  that  the  sizes  of  the  samples  taken  from 
each  population  are  equal,  m=n0  In  the  main,  the  distance 
function  used  is 


P 

A  (x,  z)  =  Max  I  x.  -z. 

i=l  1  1 


It  should  be  pointed  out  that  A  is  just  one  of  a  large  class 
of  distance  functions,  anyone  of  which  could  be  used.  This 
fact  is  mentioned  since  the  probabilities  P^  and  P^  depend 
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very  heavily  on  the  distance  function  chosen.  Most  of  the 
computations  are  made  using  k=l,  that  is,  assign  Z  to  the 
population  F  or  G  from  which  came  the  individual  of  the 
pooled  samples  which  most  closely  resembles  Z. 

The  first  case  considered  is  the  univariate  case,  p=l. 
Using  the  rule  of  the  "nearest  neighbor”;  that  is,  k=l,  and 
the  distance  function  A  =  I  x-z|, which  corresponds  to  ordi¬ 
nary  Euclidean  distance  in  this  case,  the  probabilities 
and  are  computed  for  various  values  of  n  andA» 

'.For  p=l,  the  linear  discriminant  function  is  greatly 
reduced  since  no  matrix  computation  enters.  The  arithmetic 
mean  X+Y  of  the  sample  means  is  computed  and  Z  is  assigned 
to  that  population  whose  sample  mean  lies  on  the  side  of 
X+Y  as  does  Z  itself e  The  probabilities  of  misclassifi- 
cation  are  now  readily  computed. 

Prom  the  symmetry  of  the  problem,  P^P^  30  ^  is  suf" 
ficient  to  compute  P^,  thus,  it  is  assumed  that  Z  is  distri¬ 
buted  according  to  the  P  distribution.  As  was  pointed  out 

previously,  linear  transformations  make  it  possible  to  put 

2  2 

E(X)=0,  E(Y)=A>0  and  (f  =  C^.=l  with  no  loss  of  generality. 

An  error  is  committed  by  the  linear  discriminant 
function  if  an<?  only  if, 

(i)  Z  >  X+¥  and  Y  >  X 
(ii)  Z  <  X+Y  and  Y  <  X. 

Define  U=Y-X  and  V=  X+Y-2Z.  It  is  easily  shown  that  U 


8 


and  V  are  independent  normal  random  variables  with  E(U)=A, 
=  2/n,  E(V)~\)rfy  -■  4  +  2/n.  In  terms  of  the  variables 
U  and  \IS  an  error  is  committed  by  the  linear  discriminant 
function  if  and  only  if  UV<0.  Thus  it  follows  for  the 
linear  discriminant  function  when  p=l  that  j 


Vn~  A 


Vn  A| 

■V2^nl 


where 


{j)W=  J_ 


*  J_ 

ooV?7f 


2 


dtt. 


Since  Lira  P  =  <b ( -  ;■),  it  is  observed  that  the  maximum  proba- 

m  -->  «  2 

bility  of  misclassification  is  .5.  The  values  of  P  =P  for 

1  2 

various  values  of  n  and  A  are  given  in  Table  i.  Figures  1 
and  2  give  these  results  graphically,,  All  Tables  and  Figures 
in  Section  II  have  been  reproduced  from  (4). 

We  consider  now  the  nonparametric  discriminator  using 
the  "rule  of  the  nearest  neighbor^  "  k=ls  which  consists  of 
assigning  Z  to  that  population  from  which  came  the  sample 
individual  nearest  to  z.  Suppose  that  Z=z„  Let  P^(z) 
denote  the  conditional  probability  that  the  nearest  of  the 
2n  sample  observations  to  z  is  a  given  Z&ze  Then, 


h - b  (pi<z»  ■  12 


i 


i?W 


('I 
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TABLE  1 


PROBABILITY  OP  ERROR,  LINEAR  DISCRIMINANT  FUNCTION, 
UNIVARIATE  NORMAL  DISTRIBUTIONS 


n  =  size  of  sample  taken  from  each  population 
A  =  distance  between  the  means  of  the  two  populations 
Probability  of  error  =  P  (Z  is  assigned  to  0  |  Z  came  from  P) 
=  P  (Z  is  assigned  to  P  |  Z  came  from  G) 


.5 


.2 


A=i 


.± 


o 


A  =2 


FIGURE  1 

Probability  of  error  of  the  linear  discriminant 
function  for  two  univariate  normal  distributions  with 
distance  between  means  =  A  • 
n  =  size  of  sample  from  each  population. 


11 


FIGURE  2 

Probability  of  error  P^^  of  the  linear  discriminant 
function  for  two  univariate  normal  distributions  with 
distance  between  the  means  =  A  »  plotted  as  a  function  of 

A  o 

n  =  size  of  sample  from  each  population,, 
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It  remains  then  to  calculate  P^z).  Define 

h2(  5 )  =  p(  x  «  z|  <  6  )  6  >  o 

=  p(Z  -  6<  x<z  +  6  ) 

=  4>(z  +5 )  -  4>(z  -  6  )  7 
and  K2((5)  =  P(|Y  -  z|  <  (5  ) 

=  p(z  -  A  -(5<y-A<  z  -A  +  6  ) 

=  <t>(z  -  A  ♦  <5  )  -  ♦(*  -  A  -  6  >. 

The  event,  "the  nearest  sample  value  to  z  is  a  y"  can 
be  classified  into  the  n  exclusive  equiprobable  events,  "the 
nearest  sample  value  to  z  is  y^,  i  “1>  2,  9  o  «  |  He  Since  the 
nearest  y  to  z  will  necessarily  be  the  minimum  y,  it  is 
neoessary  to  compute  the  probability  density  function  for  the 
minimum  of  |  Y^-z  |  ,  \Y2~z\  ,  .  |  Y^-z  |  .  Since  the 

|  Y±-z  |  ,  i  =  1,2,  n,  are  independent  identically  dis¬ 

tributed  random  variables,  this  density  function  is  easily 
shown  to  be, 

n(l  -  Kz(  6  )  )n_1dKz  ((5). 

P1(z)  is  then  computed  by  the  following  formula: 

1  r°°  n  1 

P1{Z)  =  n  J  (1  -  Hz((5))n(l  -  Kz(6))  dK2((5)  (2). 

Formulae  (1)  and  (2)  form  the  basis  for  all  the  computations 
for  the  "nearest  neighbor  rule"  no  matter  what  the  value  of 
p  if  for  p  >  1  one  replaces  P(  |  x  -  z  |  <  5  )  toy  P  (the  distance 
of  X  from  z  <  (5  )  and  similarly  P(  |Y  -  z|  <  (5  )  by  P  (the 
distance  of  Y  from  z  <  (5  ) .  Of  course  the  specific  evalu¬ 
ations  depend  upon  the  distance  function  used. 
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Except  for  the  case  p=l,  11=1*  in  which  case  P  and  P 

1  2 

are  the  same  for  the  linear  discriminant  function  and  the 
nonparametric  discriminator,,  the  bulk  of  the  computations 
for  the  nonparametric  discriminator  were  carried  out  by 
straightforward  numerical  integration  These  computations 
are  given  in  Table  2.  These  computations  are  quite  heavy, 
especially  for  the  case  p=2.  Therefore,  a  search  for  an 
approximation  formula  for  the  computation  of  P^z)  was 
institutedo  One  approximation  formula  was  found  which  gave 
very  good  results «  A  discussion  of  this  approximation 
formula  is  given  in  (4)»  P^  as  computed  using  the  approxi- 
mation  formula  for  P^z)  is  tabled  in  Table  2-A.  One  very 
interesting  result  which  was  obtained  using  the  approxi¬ 
mation  formula  for  P^fz)  was  that  for  large  n. 


P,  =  E 


Cnz }  +  gfz) 


dz  , 


J  —  00 

An  application  of  Schwartz *s  inequality  shows  the  latter 
integral  to  be  at  most  0.5»  It  is  thus  possible  to  assert 
that,  whatever  be  the  populations  being  discriminated,  the 
"rule  of  the  nearest  neighbor"  will  have  in  the  limit  as 
m  =  n— ^00  equal  probabilities  of  error  at  most  0.5» 

To  compare  the  figures  of  Tables  1,  2,  and  2- A,  the 
values  of  P1  =  P^  for  paired  values  of  X  are  plotted  against 
n  in  Figure  3.  In  Figure  i*.,  the  same  values  are  plotted 
against  ^  for  selected  values  of  n. 
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TABLE  2 


PROBABILITY  OP  ERROR,  NONPARAMETRTC  DISCRIMINATOR 
WITH  k=l,  UNIVARIATE  NORMAL  DISTRIBUTION 


n 

X  =1 

X  -2 

X  =3 

1 

*4175 

.2532 

.1235 

2 

.4086 

.2364 

.1084 

3 

04052 

,  2307 

.1036 

4 

.4032 

0  2280 

.1014 

TABLE  2-A 

APPROXIMATE  PROBABILITY  OP  ERROR,  NONPARAMETRIC 
DISCRIMINATOR  WITH  k=l,  UNIVARIATE  NORMAL  DISTRIBUTION 


n 

A=i 

A =2 

A  "3 

4 

.403 

.226 

.102 

5 

.401 

.225 

.100 

10 

.399 

.223 

.098 

20 

.398 

.224 

.098 

50 

.398 

,225 

.098 

00 

.398 

.225 

.098 

n  =  size  of  sample  from  each  population 
A  =  distance  between  the  means  of  the  two  populations 
Probability  of  error  =  P(Z  Is  assigned  to  G  |  Z  came  from  P) 
=  P(Z  is  assigned  to  F  |  Z  came  from  0) 
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FIGURE  3 

Comparison  of  the  probability  of  error  as  a  function  of 
n  for  the  linear  discj.’iminant  function  and  the  nonparametric 
discriminator  j  distance  function  /\  ,  k=l,  for  two  normal 
univariate  populations  with  distance  between  means  =  X  • 
n  =  size  of  sample  from  each  population 
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FIGURE  4 

Comparison  of  the  probability  of  error  P  as  a  function  of 
^  ,  the  distance  between  the  means,  for  the  linear  dis¬ 
criminant  function  and  the  nonparametric  discriminator, 
distance  function  =  ^  ,  k=l,  for  two  normal  univariate 
populations 

n  =  size  of  sample  from  each  population 
n  =  1  is  identical  for  both 
—  indicates  the  nonparametric  procedure 
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Not  discussed  in  this  paper,  but  investigated  to  a  very 
limited  extent  in  (4)  are  the  following  cases: 

(i)  the  nonparametric  discriminator  using  A  as  a 
distance  function  with  k  ^  3  for  the  univariate 
and  bivariate  normal  distributions 

(ii)  the  nonparametric  discriminator  using  A  as  a 
distance  function  k  =  1,  n  =  1  for  p  ^  2 

(iii)  the  effect  of  distance  functions  other  than  A  on 

the  probabilities  of  misclassification  for  bivariate 
normal  distribution 

Although  the  investigation  of  the  above  cases  was  ex¬ 
tremely  limited  due  to  the  laborious  computations,  the 
results  that  were  obtained  indicated  that  the  nonparametric 
discrimination  procedure  gave  "reasonable”  error  probabili¬ 
ties  in  both  cases  (i)  and  (li).  In  the  bivariate  normal 
distribution,  different  distanoe  functions  produced  vastly 
different  error  probabilities  in  some  situations. 
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SECTION  III 


PERFORMANCE  OF  THE  LINEAR  DISCRIMINANT  FUNCTION 
AND  A  CLASS  OF  NONPARAMETRIC  DISCRIMINATORS 
WHEN  F  AND  G  ARE  EXPONENTIALLY  DISTRIBUTED 

In  this  section,  a  limited  investigation  of  the  linear 
discriminant  function  and  the  nonparametric  discriminator 
using  /\  as  a  distance  function  and  using  "the  rule  of  the 
nearest  neighbor,  "  k=l,  is  made  when  F  and  G  are  not  normally 
distributed;  but  in  fact,  exponentially  distributed  with 
parameters  A  and  (i  respectively,,  The  performance  of 
both  the  linear  discriminant  function  and  the  nonparametric 
discriminator  will  be  investigated  again  by  computing  the 
probabilities  of  misclassification.  Under  the  assumption 
that  F  and  G  are  exponentially  distributed,  it  will  be  shown 
that  the  linear  discriminant  function  and  the  nonparametric 
discriminator  using  A  as  a  distance  function  and  "the  i*ule 
of  the  nearest  neighbor"  can  give  high  probabilities  of 
misclassification,, 

Throughout  the  remainder  of  the  section,  it  will  be 
assumed  that  m  =  n  and  that  F  and  G  are  exponentially  dis¬ 
tributed  with  parameters  A  and  //  respectively.  Because 
of  the  heavy  computations  involved  in  computing  the  probabili¬ 
ties  of  mis  class  if  ieation; 

(i)  P-j_  =  P  (assigning  Z  to  G|  Z  came  from  F) 

(ii)  -  P  (assigning  Z  to  F  |  Z  came  from  G) 
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the  only  cases  Investigated  will  be  for  p=l  and  n=l,2. 

and  will  first  be  computed  for  the  linear  dis¬ 
criminant  function,.  The  procedure  here  is  precisely  that 
which  was  used  in  Section  II  for  p  =  1.  One  simply  computed 

the  arithmetic  mean  X  +  Y  of  the  sample  means  and  assigns 

“2“ 

Z  to  that  population  whose  sample  mean  lies  on  the  side  of 

X+f  as  does  2  itself.  While  P  ^  P  g  it  is  only  neces- 
2  1  d 
sary  to  compute  P^  since  can  readily  be  computed  from  P^ 

by  interchanging  X  811(1  jl  • 

Proceeding  as  in  Section  II,  define  the  new  variables 
U  =  Y  -  X  and  V  =  X  +  Y  -  2Z.  If  U  and  V  are  to  be  inde¬ 
pendent,  it  is  necessary  that  the  covariance  of  U  and  V  be 
zero.  Computing  the  covariance  of  U  and  V  we  have: 


Cov(U,  V)  = 


1 1  _  l 
1  A" 


f  0  except  for 


Since  discrimination  is  not  possible  for  X  - 
Cov(U,V)  will  not  be  zero  and  in  general  U  and 


independent.  As  before,  an  error  is  committed 
criminant  function  if  and  only  if* 


A  -  M. 

11  ,  the 
V  will  not  be 
by  linear  dis- 


(i)  Z  >  X  +  Y  and  Y  >  X 
_  2  _  _  _ 
(11)  Z  <  X  ♦  Y  and  Y  <  X  , 


In  terms  of  the  variables  U  and  V,  an  error  is  committed  if 
and  only  if  UV  <  0,  and  therefore, 

Px  =  P  (UV  <  0)o 
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Since  U  and  V  are  not  in  general  independent,  the  probability 
that  UV <  0  is  not  easily  computed.  It  is  necessary  to  com~ 
pute  the  joint  density  function  for  U  and  V  and  integrate 
over  the  region  where  UV  <  0„  The  joint  density  function 
of  U  and  V  was  computed  but  because  of  the  complex  nature  of 
this  function,  it  was  considered  easier  to  compute  P^  di¬ 
rectly.  By  (i)  and  (ii)  and  the  definition  of  it  follows 
that, 

Px  =  P  (Z  >X  +  Y,  Y  >X)  +  P  ( Z  <r  X  +  Y,  Y<X). 

Let  T  =  nY  and  S  “  nX  and  thus, 

f  la  the  gamma  density  function  with  parameters  n  and/f 
f  is  the  gamma  densit  y  function  with  parameters  n  and  A, 
Since  T,  S,  and  Z  are  independent  random  variables, 

/°°  f2  (Z)  fI  (t)  fS  (S)  dZ  dt  dS 

°  s  2n 


P^  can  now  be  computed  by  direct  numerical  integration.  For 
n=l,  P^  as  a  function  of  A  and  /i.  is, 

px  (A ,M> 


By  interchanging  A  and  fJ.  s  P^  iss 

P2  (A,  /U  )  =  Px  i/JL»  A  ) 
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Recognizing  that  the  numerator  and  denominator  in  the  ex¬ 
pressions  for  P  and  P  are  homogeneous  of  degree  3  in  A 
1  2 

and  {l  ,  P^  and  P^  can  be  expressed  In  terms  of  a  single 
parameter  c  by  setting  A  -  &  (1  °  Making  this  substitution 
in  the  expressions  for  P^  and  we  have*, 

P2  (o)  = 


For  n=l,  P1  and  P^  for  the  linear  discriminate  function  are 
the  same  as  P^  and.  P  for  the  nonparametric  discriminator 
using  A  as  a  distance  function  and  "the  nearest  neighbor 
rule,  k=l .  " 

For  n=2,  the  substitution  A  =  c /I  is  again  appropri¬ 


ate  and  P-j^  and  for  n~2  are  as  follows. 


Px(c)  = 


(c  +  4f  (3c  +  2  f 


+  12l±21  _ 


(c  +  if  2^  (3c  +  2) 


(c)  =  P 


Values  of  P_  and  P  for  the  linear  discriminant  function 
1  2 

for  n=l  and  2  are  tabled  for  various  values  of  o  in  Table  3. 

P1  and  P2  are  next  computed  for  the  nonparametric  dis¬ 
criminator  for  the  case  n=2«  The  procedure  used  is  exactly 
the  procedure  used  in  Section  II,  The  substitution  A  =c/l» 
is  once  more  appropriate.  and  P^  in  terms  of  a  single 
parameter  c  are  as  follows: 
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P1(c) 


2  -  38c  ~  112) 

c-+~Wro~~r 


+  (32  +  2kc  -  56c2  -  12c3) 
3  (3c  +  2)(  c*  *  1) 


l6c^ 


(cZ  - 


1) (2c  +  1) 


4  .  (112  -  52c  -  30c2) 

T5c+T)  +  -T5(o  +  4)(c  -  1) 


+  IBHt}  -  Tc-T-TT 


for  c  ^  2 


M°>  =  OPf2  -  38°  -  U2)  +  (32  +  2^  -  56c‘ 

iMc  +  2)1  c  -  1)  , 


12c3) 


3(3c  +  2)  (c  -  1) 


(2c3  +  16c2  -  2c)  ,  k 

+  - - - <■  +  f5=+2T 


( 2c  +  l)(c 
(c  +  if 


1) 


(2U  “  Uc  -  10c^) 
Me  + 4)lc  -  1) 


for  c  -2 


P2(c) 


P1  i 

1  c 


Values  of  and  for  various  values  of  c  for  the  non- 
parametric  discriminator  with  n=2  are  given  in  Table  3. 

It  is  observed  in  Table  3,  that  P-j^  and  exceed  0.5 
for  numerous  values  of  c,  Because  of  this  observation,  an 
investigation  was  made  to  determine  the  values  of  c  for 
which  P1  and  P^  exceed  0<,5<>  Figure  5  displays  graphically 
the  regions  in  the  A,  U  plane  where  P^  and  P^  are  greater 
than  0»5. 

Figure  5*  points  out  only  too  well  that  great  caution 

should  be  used  when  applying  the  linear  discriminant  in 

» 

situations  when  the  populations  are  other  than  normal. 
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TABLE  3 

PROBABILITIES  OP  ERROR,  UNIVARIATE 
EXPONENTIAL  DISTRIBUTIONS 


r— - - - 

c 

1 

2 

3  4 

5 

10 

Linear 

Discriminant 

H 

,5000 

.4000 

.3262  .2741 

.2360 

.1385 

Function#,  n=l 

p 

*2 

.5000 

•5333 

.5214  .5037 

.4870 

.4329 

Linear 

Discriminant 

1 

,5ooo 

.3736 

.2652  .2009 

.1567 

.0627 

Rmction,  n=2 

OJ 

P. 

.5000 

.5299 

.5041  .4782 

•  4577 

.4056 

Nonparametric 

pi 

.5000 

.4222 

.3559  .3066 

.2692 

.1675 

Discriminator, 

n=2 

Q 

.5000 

.5003 

.4666  .4295 

.3706 

.3281 

■■ 

c  is  a  parameter  such  that  A  ®  c (JL 
A  is  the  parameter  of  the  P  population 
11  is  the  parameter  of  the  G  population 
P^  =  P  (assigning  Z  to  G  |  Z  came  from  P) 

?2  =  P  (assigning  Z  to  P  |  Z  came  from  G) 
n  =  sample  size 

#For  n=l,  the  probabilities  of  error  P1  and  P ^  for  the  linear 
discriminant  function  are  equal  to  the  corresponding  probabili¬ 
ties  of  error  P.^  and  P ^  for  the  nonparametric  discriminator. 
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/  c  =  *Ul22 


Linear  discriminant 
function#,  n  =  1 


Linear  discriminant 
function,  n  =  2 


Nonparametric  discriminator 
n  =  2 


FIGURE  5 

Values  of  A  and  //  for  which  and  exceed  0,5 
c  =  parameter  such  that  A  =  c  \i 
A  =  parameter  of  F  distribution 
//  =  parameter  of  G  distribution 

=  P  (Z  is  assigned  to  G  |  Z  came  from  F) 

?2  =  P  (2  is  assigned  to  F  i  Z  come  from  G) 
n  =  sample  size 

■^Linear  discriminant  function  is  equivalent  to  the  non¬ 
parametric  discriminator  for  n  =  1, 
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SECTION  IV 


SUMMARY  AND  CONCLUSIONS 


In  any  discrimination  problem  one  has  a  choice  between 
using  parametric  or  nonparametric  procedures*  This  choice 
in  general  will  depend  upon  three  factors : 

(i)  the  strength  of  the  users  belief  in  his  parametric 
model,, 

(ii)  the  loss  that  would  be  suffered  by  using  the  non¬ 
parametric  rule  if  in  fact  the  parametric  form  is 
correct. 

(lii)  the  loss  that  would  be  suffered  by  using  the 

parametric  rule  if  the  actual  densities  depart 
from  the  parametric  form  assumed. 

For  the  two  population  discrimination  problem.  Section 
II  of  this  paper  concerned  itself  with  (ii).  In  Section  II, 
It  was  assumed  that  the  two  populations  being  discriminated 
were  normal  with  equal  covariance  matrices.  For  the  univari¬ 
ate  case*,  the  parametric  procedure  used  was  the  well  known 
linear  discriminant  function  which  is  known  to  be  optimal 
in  this  situation.  The  nonparametric  procedure  used  was  the 
rule  whereby  a  random  variable  was  classified  as  belonging 
to  the  population  which  had  the  nearest  observation  to  an 
observed  value  of  the  random  variable  being  classified.  A 
comparison  of  these  two  procedures  was  made  by  computing  and 
comparing  the  probabilities  of  misclassification. 
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Also  for  the  two  population  discrimination  problem,  an 
investigation  of  the  linear  discriminant  function  and  the 
same  nonparametric  procedure  was  carried  out  when  the  two 
populations  were  not  normal  but  exponential.  Again  the  in¬ 
vestigation  was  made  by  computing  the  probabilities  of  mis- 
classification  for  both  procedures.  This  investigation  was 
made  in  Section  III  of  this  paper.  Because  of  the  lengthy 
computations  Involved  in  computing  the  probabilities  of  error 
for  both  of  these  procedures,  the  only  cases  considered  were 
the  univariate  case  for  sample  sizes  of  1  and  2.  It  was 
shown  that  for  the  two  cases  investigated,  sample  sizes  of 
1  and  2,  that  both  the  procedures  could  give  poor  results 
depending  on  the  parameters  of  the  distributions. 

In  conclusion,  it  seems  reasonable  that  if  the  popu¬ 
lations  to  be  discriminated  are  well  known,  and  have  been 
investigated  to  be  such  that  the  normal  distribution  gives 
a  good  fit  and  that  the  variance  and  correlation  do  not 
change  much  when  the  means  are  changed,  and  if  the  classifi¬ 
cation  to  be  made  warrants  the  labor  of  matrix  inversion, 
then  the  linear  discriminant  function  should  be  used.  How¬ 
ever,  if  the  populations  are  either  not  well  known;  or  are 
known  not  to  be  approximately  normal  or  to  have  very  differ¬ 
ent  covariance  matrices;  or  if  the  discrimination  is  such 
that  small  decreases  in  probability  of  error  are  not  worth 
extensive  computations,  then  a  nonparametric  procedure  seems 
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to  be  advisable.  Which  nonparametric  procedure  is  a  matter 
of  choice  for  the  user. 

Recommendations  to  be  made  on  the  basis  of  this  paper 

are : 

(i)  tabulate  the  probabilities  of  error  for  the  linear 
discriminant  function  in  representative  situations 
for  the  case  where  the  populations  being  discrimi¬ 
nated  are  multivariate  normal  with  equal  covari¬ 
ance  matrices. 

(li)  further  investigation  (for  larger  sample  sizes) 
of  the  linear  discriminant  function  in  the  cas* 
where  the  populations  being  discriminated  are  ex¬ 
ponential  because  of  the  importance  of  the 
exponential  distribution  in  the  field  of  life 
testing  and  other  applied  problems. 

(iii)  investigation  as  to  the  effect  of  other  distance 

functions  for  the  nonparametric  discriminator  dis¬ 
cussed  in  this  paper  in  the  case  when  the  popu¬ 
lations  being  discriminated  are  exponential  or 
some  other  class  of  distributions. 
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